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Abstract — In this work we consider tlie communication of 
information in the presence of a causal adversarial jammer. In the 
setting under study, a sender wishes to communicate a message to 
a receiver by transmitting a codeword x — (xi, . . . , x„) bit-by- 
bit over a communication channel. The sender and the receiver 
do not share common randomness. The adversarial jammer can 
view the transmitted bits Xi one at a time, and can change up to a 
p-fr action of them. However, the decisions of the jammer must be 
made in a causal manner. Namely, for each bit Xi the jammer's 
decision on whether to corrupt it or not must depend only on 
for j < This is in contrast to the "classical" adversarial 
jamming situations in which the jammer has no knowledge of x, 
or knows x completely. In this work, we present upper bounds 
(that hold under both the average and maximal probability of 
error criteria) on the capacity which hold for both deterministic 
and stochastic encoding schemes. 

Index Terms — channel coding, arbitrarily varying channels, 
jamming 



I. Introduction 

Alice wishes to transmit a message u to Bob over a binary- 
input binary-output channel. To do so, she encodes u into a 
length-n binary vector x and transmits it over the channel. 
However, the channel is controlled by a malicious adversary 
Calvin who may observe the transmissions, and attempts to 
jam communication by flipping up to a p fraction of the bits 
transmitted by Alice. Since he must act in a causal manner, 
Calvin's decisions on whether or not to flip the bit Xi must 
be a function solely of the bits xi,...,Xi he has observed 
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thus far. This communication scenario models jamming by an 
adversary who is limited in his jamming capability (perhaps 
due to limited transmit energy) and is causal. This causality 
assumption is reasonable for many communication channels, 
both wired and wireless. Calvin can only corrupt a bit when it 
is transmitted (and thus its error is based on its view so far). 
To decode the transmitted message. Bob waits until all the bits 
have arrived. 

In this paper we investigate the information-theoretic limits 
of communication in this setting. We stress that in our model 
Calvin knows everything that both Alice and Bob do - there 
is no shared secret or common randomness (a model where 
such a shared secret may be allowed has been considered in the 
literature pertaining to Arbitrarily Varying Channels, discussed 
further in Section II- Al l. However, we make no assumptions 
about the computational tractability of Alice, Bob, or Calvin's 
encoding, decoding and jamming processes. Our main contri- 
bution in this work is a converse that helps to make progress 
towards a better understanding of the communication rates 
(average number of bits per channel use) achievable against 
a causal adversary. Specifically, we describe and analyze a 
novel jamming strategy for Calvin and show that it (upper) 
bounds the rate of communication regardless of the coding 
strategy used by Alice and Bob. This jamming strategy results 
in Calvin being able to force Bob's average probability of 
decoding error over all of Alice's messages to be bounded 
away from zero (and hence correspondingly also his maximum 
probability of error). 

A. Previous and related work 

Many of the following works deal with related channels; we 
restrict our discussion mostly to the binary-input binary-output 
case, except where specifically indicated otherwise. 
Coding theory model: A very strong class of adversarial 
channels is one where Calvin is omniscient - he knows Alice's 
entire codeword x prior to transmission and can tailor the 
pattern of up to pri bit-flips to each specific transmission. 
This is the " worst-case noise" model studied in coding 
theory. In this model there is no randomness in code design, 
and it is desired that Bob always decodes correctly. For 
binary channels, characterizing the capacity has been an open 
problem for several decades. The best known upper bound is 
due to McEliece et al. ||2] as the solution of an LP, and the 
best known achievable scheme corresponds to codes suggested 
by Gilbert and Varshamov ||3], iSl, which achieve a rate of 
1 — H{2p). Improving either of these bounds would be a 
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significant breakthrough^ 

Information theory model: A much weaker class of ad- 
versarial channels is one where Calvin generates bit flips in 
an i.i.d. manner with probability p and Bob must decode 
correctly "with high probability" over the randomness in 
Calvin's bit-flips. The original work of Shannon [8] effectively 
characterized the capacity of this binary symmetric channel 
BSC(p). The capacity 1 — H{j)) in Shannon's setting (for 
crossover probability p) is strictly greater than that of the 
coding theory model. 

Causal adversarial model: The class of channels considered 
in this work, i.e., that of causal adversaries, falls in between 
the above two extremes. In one direction this is because a 
causal adversary is certainly no stronger than an omniscient 
adversary, since he cannot tailor his jamming strategy to 
take into account Alice's future transmissions. Indeed, the 
work of Haviv and Langberg [(9] indicates that (for 2p < 
H~^{\/2) ~ 0.11) rates strictly better than those achievable 
by Gilbert- Varshamov codes ||3], llll against an omniscient 
adversary are achievable against a causal adversary. However, 
since it is still unknown whether Gilbert- Varshamov codes are 
optimal against omniscient adversaries, it is unknown whether 
causal adversaries are indeed strictly weaker than omniscient 
adversaries. Nonetheless, the Gilbert- Varshamov bound and 
the bound of ||9l indicate that for p < 1/4 the capacity under 
causal adversaries is bounded away from zero. 

In the other direction, the causal adversarial model under 
study is at least as strong as the information theoretic model 
in which Calvin generates bit flips in an i.i.d. manner Specif- 
ically, if p < 1/2, for any 5p > and sufficiently long 
block-length n a causal adversary can ignore the transmitted 
codeword seen so far and just mimic the behavior of a binary 
symmetric channel BSC(p — 5p) - with high probability he 
does not exceed his budget of pn bit-flips. Similarly, \f p > 
1/2, Calvin simply mimics the behavior of a BSC(l/2). This 
implies that when communicating in the presence of causal 
adversaries with jamming capabilities that are parametrized 
by p, 1 — H{p) is an upper bound on the achievable rate for 
p < 1/2, and no positive rate is achievable for p > 1/2. 
Improving over this naive upper bound (and hence narrowing 
the gap to the lower bound of [91) is the focus of the paper 

The improved upper bounds we present hold for general 
coding schemes that allow Alice to encode a message u to 
one of several possible codewords x € {x{u,r)}, where r 
is a random source available to Alice but unknown to either 
Bob or Calvin. Such general coding schemes are referred to 
as stochastic coding schemes. We stress that in such schemes 
there is no shared randomness between Alice and Bob, and 
the source of randomness in Alice's encoder is solely known 
to Alice. 

Arbitrarily Varying Channels: Our model is a variant of 
the arbitrarily varying channel (AVC) model llTOl . The AVC 

'As is often the case, results for channels over "large" alphabets are 
significantly easier. In the "intermediate" alphabet-size regime, wherein the 
alphabet is of size at least 49, advances in Algebraic-Geometry codes over the 
last three decades (see for a survey) have resulted in codes exceeding the 
Gilbert- Varshamov bound. For alphabets larger than n, the bound of 1 — 2p 
due to Singleton (6| is known to be achievable in a computationally efficient 
manner via Reed-Solomon codes (7). 



model where the adversary has access to the entire codeword 
was considered by Ahlswede and Wolfowitz ifTTI . lfT2l but 
received little attention since ifTsl Problem 2.6.21]. General 
AVC models have been extended to include channels with 
constraints on the adversary (such as pn bit flips) for cases 
where the adversary has no access to the codeword [il4i . or has 
access to the full codeword ifTSl . For binary channels in which 
the jammer has knowledge of the entire codeword x, f T6l 
showed that 0(log n) bits of common randomness is sufficient 
to achieve the optimal rate of 1 — H{p) (and the work in ifTTl 
investigated computationally efficient constructions of such 
codes). However, issues of causality have only been studied 
in the context of randomized coding (when the encoder and 
decoder share common randomness), but not for deterministic 
codes or stochastic encoding. 

Delayed adversaries: The delayed adversary model was stud- 
ied in ifTSl and |fT9l . In this model, the jammer's decision on 
whether to corrupt Xi must depend only on Xj for j < i — Dn 
for a delay parameter D G [0, 1]. The case of D = is exactly 
the causal setting studied in this work, and that of Z? = 1 
corresponds to the "oblivious adversary" studied by Lang- 
berg ifTsl . In this oblivious adversary setting the work of ll20l 
demonstrates computationally efficient code constructions that 
achieve information-theoretically rate-optimal throughput of 
1 - H[p) for an p < 1/2. 

In a different line of work, Dey et al. |[T9l showed that 
for a large class of channels, the capacity for delay Z? > 
equals that of the constrained AVC model ||2TI . In particular, 
a positive delay implies that the optimal rate 1 — H{p) is 
achievable against a delayed adversary over a binary-input 
binary-output channel. In this paper we show that a causal 
adversary is strictly stronger than a delayed adversary with 
D > Q for all p > 0.0804. For p smaller than this value our 
techniques do not help separate the capacity regions of these 
two models. 

Causal and delayed adversaries for "large alphabets": In 

the large alphabet setting (where the alphabet-size is allowed 
to grow without bound with increasing block-length), Dey et 
al. 1221 give a full characterization of the capacity-region of 
several variants of both the causal adversary and the delayed 
adversary models. They further give computationally efficient 
codes achieving every point in the capacity regions for the 
models considered. In general, in the large alphabet regime 
code design is easier than in the binary alphabet regime 
(that is the primary focus of this work) since with large 
alphabets, a "few random hashes" can be hidden inside each 
symbol with asymptotically negligible rate-loss. These hashes 
aid the decoder in detecting the adversarial attack pattern and 
correcting for it. In the binary alphabet setting this technique 
is not applicable - this is one of the bottlenecks in further 
narrowing the gaps between outer and inner bounds for the 
model considered in this work. 

Previous attacks: This work continues our preliminary work 
on binary causal channels ||231 (and a related result of Gu- 
ruswami and Smith ||20| ). which proposed an upper bound us- 
ing the so called "wait-and-push" attack. This work improves 
on this earlier work in two aspects - specifically the bound 
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presented is tighter, and holds also for stochastic encoding. 



B. Main result 

Our improved bounds are given in the following theorem, 
and are depicted (in comparison with the previous bounds) in 
Figure [T] For any p G [0,p], let a{p,p) = 1 — 4(p — p). In 
what follows, C{p) is the capacity of the causal channel under 
study. For precise definitions and model see Section Hi] 

Theorem 1 For p G [0,1/4], the capacity C{p) of a binary 
causal adversary channel with constraint p satisfies: 



C{p) < min 



For p > 1/4 the capacity C{p) = 0. 



P 



a{p,p) 



A few remarks are in order Notice that in the regime p < p < 
1/4 it holds that p < a{p,p) and thus -^^^^ in the expression 
of Theorem [T] is at most of value 1. We show in Appendix lAl 
that the optimum p in the computation of C{p) is 



mm < p, 



3(1 - 4p) 



2+ V1592 + 24V33+ {^1592 - 24^/33 
8.4445 J ■ 



mm < p. 



Namely, for p greater than approximately 0.0804, the capacity 
C{p) is bounded away from 1 — H{p) and for p less than this 
value our bound equals 1 — H{p) (in the latter case we get 
p = p). For p = 1/4 the new strategy we propose for Calvin 
shows that no positive rate is achievable; when p > 1/4 Calvin 
can simply mimic the case p = 1/4. 

C. Techniques and Proof Overview 

To prove Theorem [T] we show that no matter which en- 
coding/decoding scheme is used by Alice and Bob, there 
exists a strategy for Calvin that does not allow communication 
at rate higher than C{p) . Specifically, we demonstrate that 
whenever Alice and Bob attempt to communicate at a rate 
higher than C{p), there exists a causal jamming strategy (that 
in general depends on Alice and Bob's encoding/decoding 
strategy) that allows Calvin to enforce a constant probability 
of error bounded away from zero. More precisely, for any 

- For completeness, we specify the two major differences between this 
paper and |23| . First, we propose a different two-phase attack ("babble-and- 
push") which gives a tighter outer bound than the previous attack ("wait-and- 
push"). In "wait-and-push," Calvin passively eavesdrops in the first phase uses 
this information to design an error vector to confuse Bob in the second phase. 
In our new attack, Calvin instead injects noise in the first phase to increase 
Bob's uncertainty about Alice's transmissions. However, we must carefully 
choose the number of bit-flips Calvin injects in this "babble" phase to obtain 
a tighter outer bound, because Calvin must trade-off between using bit-flips 
to increase Bob's uncertainty and to push to an alternative codeword in the 
second phase. The second improvement in this paper is that we prove that 
the "babble-and-push" attack works even when Alice and Bob use stochastic 
encoding (i.e., for each message u she has, Alice may choose to transmit one 
of multiple possible codewords x(m), with an arbitrary random distribution 
over the set of codewords). Our bounds therefore hold for general codes, as 
opposed to previous work (23], where the outer bound was proved for codes 
in which each message u corresponded to a unique x(ti) deterministically 
chosen by AHce. 
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Figure 1. We plot previous bounds related to the channel at hand compared to 
our bound. The upper bound of 1 — H{p) con'esponds to the binary symmetiic 
channel. The lower bound [5] (denoted HL) is based on an evaluation of the 
parameters specified by Haviv and Langberg j9] and it slightly improves on 
the Gilbert- Varshamov bound 1 — H{2p). Our improved bound appears in 
between. 



block-length n, any e > and any encoding/decoding scheme 
of Alice and Bob of rate {C{p) + + £, Calvin can cause 
a decoding error probability of at least e'-^^^/'^'. 

At a high level, Calvin uses a two-phase "babble-and- 
push" strategy. In the first phase of £{p) channel uses, Calvin 
"babbles" by behaving like a BSC(p) for some p chosen as 
a function of p. In the second phase of n — £{p) channel 
uses, Calvin randomly selects a codeword from Alice and 
Bob's codebook that is consistent with what Bob has received 
so far. Calvin then "randomly pushes" the remaining part of 
Alice's codeword towards his selected codeword (i.e., in every 
location in the "push" phase where Alice's codeword bit differs 
from his selected codeword, he adds a bit-flip with probability 
half). A decoding error occurs if Calvin is able to push the 
transmitted codeword half the distance towards the codeword 
selected by Calvin (via a standard symmetrization argument 
CI). 

Roughly speaking, the first phase allows Calvin to gain 
information regarding which codeword was transmitted by 
Alice, while the second phase allows Calvin to use this in- 
formation in order to design a corresponding symmetrization- 
based jamming strategy. 

In Section |III| we present the proof of our main result, 
that of the outer bound on the capacity of online adversaries. 
Section |IV] then improves on this result (by giving a tighter 
bound on the probability of error) for the special case of deter- 
ministic encoders (rather than the general stochastic encoders 
considered in Section HIH i. 

II. Model and preliminaries 

We first reprise some standard notation. Let dni-, ■) denote 
the Hamming distance function between two vectors (number 
of locations in which two vectors of the same length differ). 
The Hamming weight wtnix) of a vector x is the Hamming 
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distance between that vector and the all-zeros vector Let 
log(.) denote the binary logarithm, here and throughout. As is 
common, the notation H{A) is used to denote the (binary) 
entropy of a random variable A, H{A\B) to denote the 
conditional entropy of A given B, and I{A] B) to denote 
the mutual information between A and B. Also, for any real 
number number x E (0, 1), H{x) denotes the binary entropy 
function. Properties of and inequalities between these functions 
are referenced at the point in the text where needed. The 
indicator function 1 (condition) takes value 1 if condition 
is true, and otherwise. 

Let the input and the output alphabets of the channel 
be X and y respectively. For any positive integer k, let 
[k] = {1,2,..., k}. We let U = [2"^] denote Alice's message 
set, and U denote the message random variable uniformly 
distributed in U. A deterministic code of rate R and block- 
length n is a pair of maps Cd = ($, where $ ; — > A"" 
and \E' : 3^" U are deterministic maps. The map $ is called 
the encoder and the map is called the decoder 

A code with stochastic encoding and decoding of rate R 
and block-length n is a pair of maps Cs = ($, 5') where 
^ : U ^ X" and * : y" U are probabilistic maps. 
The random map $ gives a probability distribution p{-\u) 
on X"' for every u € U. The mapping ^'(y) is a random 
variable taking values fmvaU. The encoding $ is equivalently 
represented by first picking a random variable *H from a set 
M according to a conditional distribution ptH|u(-|u), and then 
applying a deterministic encoder map $ : U x ^ A"". 
Note that our definition does not preclude there existing pairs 
{u,r) and {u',r') such that $(u,r) — $(u',r'). As we are 
addressing upper bounds on the capacity C{p) in this work, it 
is crucial to prove our results in the stochastic setting above 
- any bounds proved in the stochastic setting also hold in the 
deterministic setting. 

A causal adversarial strategy of block-length n is a se- 
quence of (possibly random) mappings Adv = {/c*' ■ * G 
[n]}. Here each /^'' : A"* x £^^^ £ depends on C, 
and for each time i e [n] chooses an action at time i, 
Ci = f^^ {xi , . . . ,Xi) £ £ - the inputs to f^^ are the past and 
current channel inputs {xi,X2, . . . ,Xi) and its own previous 
actions (ei, 62, . . . , e^-i). The resulting channel output at time 
i is Ui — Xi + Bi. In our setting £ = {0,1}. The strategy 
obeys constraint p if the Hamming weight |je|| = 
of e = (ei, . . . , e„) is at most pn over the randomness in the 
message, encoder, and strategy. For a given adversarial strategy 
and an input codeword x, the strategy produces a (possibly 
random) e and the output is y = x0e. Let PrAdv(y|x) denote 
the probability of an output y given an input x under the 
strategy Adv where this strategy might depend on (w, r) via the 
adversary's causal observations of x - to simplify notation we 
henceforth do not make this explicit. When the block-length is 
understood from the context, let Adv(p) denote all adversarial 
strategies obeying constraint p. 

The (average) probability of error for a code with stochastic 



encoding and decoding is given by 

e= max ^-5— > > P<kiu(''I^') 
AdveAdv(p) 2^" ^ ^ ' -^^^^ ' ' 

11=1 re5? 

VPr(y|<l>(u,r))Pr(vI/(y)^w), (1) 

^ — ' Adv 

y 

where the probability Pr (^(y) ^ u) is over any randomness 
in the decoder (but there is no conditioning on r since shared 
randomness between the encoder and the decoder is not 
allowed). We can interpret the errors as the error in expectation 
over Alice choosing a message U = w and a codeword 
X = $(u,r) according to the conditional distribution p{'x.\u). 

A rate R is achievable against a causal adversary under 
average error if for every 5 > Q there exist infinitely many 
block-lengths {?!;}, such that for each 71; there is an 71; 
block-length (stochastic) code of rate at least R and average 
probability of error at most 5. The supremum of all achievable 
rates is the capacity. We denote by C{p) the capacity of the 
channel corresponding to adversaries parametrized by p. 

Consider a code of block-length n, rate R and error proba- 
bility i5. We can, without loss of generality (w.l.o.g.), assume 
that the encoding probabilities {p(x|w) : x e {0,1}", w G 
[2"^]} are rational. To see why this is the case, note that for 
any small 77 > we can find rational numbers {p(x|u)} such 
that /9(x|u) — 77 < /5(x|m) < p(x|u). Now consider a code with 
encoding probabilities (/(xjit) = p(x|u) for x 7^ and assign 
the remaining probability to 0. Under the same decoder, this 
code has error probability at most 5 + 2"+"^?7, but since ?y 
was arbitrary, the error is at most 25. 

Now, for a given stochastic code, let N be the least common 
multiple of the denominators of p{'x\u) for all x, u. Each 
codeword x of u can be treated as iVp(x|u) copies of the 
same codeword with conditional probability 1/N each. So we 
can equivalently associate a random variable with |^| = N 
s.t. the conditional distribution pg^iuC'l") is uniform, and the 
encoding map <I'(m, •) is not necessarily injective. Since we 
consider the uniform message distribution, henceforth, w.l.o.g., 
we assume that the joint distribution /Ou,sr(', ■) is uniform. 

We use a version of Plotkin's bound ll24l in our proof. This 
result gives an upper bound on the number of codes in any 
binary code with a given minimum distance. 

Theorem 2 (Plotkin bound (lU) There are at most 
^ 2d„in.^ codewords in any binary code of block-length n 
with minimum distance dmin > n/2. 

III. Proof of Theorem[T] 

In this section we analyze an adversarial attack for the 
general case of stochastic encoders and decoders. For fully 
deterministic codes the analysis is more combinatorial and the 
error bounds are somewhat better, as shown in Section JV] 

Let p G [0, 1/4] and let p < p. Without loss of generality 
we assume that pn is an integer - if not, Calvin can simply 
choose the largest p' smaller than p such that p'n is an integer 
Asymptotically in n, the effect of this quantization on our outer 
bound is negligible. Let e > 0. In what follows we prove that 
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the rate of communication over the causal adversarial channel 
(with parameter p) is bounded by 



where 



and 



R<C + e, 



C = a{p,p) [l-H 



a{p,p) 



a 



aip,p) = 1 - 4(p-p), 



(2) 



(3) 



(4) 



as defined in Theorem[T] Namely, if ©-(llli is violated, for any 
sufficiently large block-length n, and any (n-block stochastic) 
code Cs = ($, ^) shared by Alice and Bob, there exists an 
adversarial jammer Adv that can impose a constant decoding 
error The decoding error we obtain will depend on e > 0. 

For p = p, the adversary can generate a noise sequence to 
simulate a BSC with crossover probability arbitrarily close to 
p, which yields an upper bound of 1 — H{p) on the capacity. 
We therefore assume that p — p > and that e < 2{p — p). 
We show that for such e > there cannot exist a sequence of 
codes (each with rate at least C+e) of increasing block-length 
n, such that the probability of error of these codes converges 
to asymptotically in n. To do so we will consider block- 
lengths n > ri(e~^). Note that this argument does not provide 
lower bounds on the error of codes of a given block-length, 
but instead shows a bound on the capacity. We elaborate on 
this point at the end of the proof. 

Our converse bound is based on a particular two-phase 
adversarial strategy for Calvin that we call "babble-and-push." 
Let £ = {a + e/2)n and without loss of generality assume 
i GN. For a vector z of length ji, let Zi — (zi, Z2, . . . , Zi) and 
Z2 — {ze+1, • ■ • , Zn). In what follows, Zi will correspond 
to the first phase of Calvin's attack, while Z2 corresponds to 
the second phase. For p > the strategy is given as follows. 

• ("Babble") Calvin chooses a random subset T of pn 
indices uniformly from the set of all (pn)-sized subsets 
of {1, 2, . . . , £}. For i G r, Calvin flips bit xf, that is, for 
t G {1, 2, . . . , £}, e, = 1 for i e r and e; ^Ofoii(^T. 

• ("Push") Calvin constructs the set of {u, r) that have 
encodings x(u, r) = $(u,r) that are close to yi = 
2/1, . . . , y£. Namely, Calvin constructs the set 



Byi = {{u,r) : dH{yi,xi{u,r)) =pn}, 



(5) 



and selects an element {u',r') g By^ uniformly at ran- 
dom. Calvin then considers the corresponding codeword 
x' ~ $(u',r'). Given the selected x', for i > i, if 
Xi 7^ x';, Calvin sets equiprobably to or 1 until 



,=1 



pn or i ~ n. Note that, under our assumption 
(w.l.o.g.) of uniform pu.m, the a posteriori distribution 
of Alice's choice {u, r) given yi is also uniform in By-^. 

We start by proving the following technical lemma that we 
use in our proof. 

Lemma 3 Let V be a random variable on a discrete finite 
set V with entropy H(V) > X, and let Vi, V2, . . . , Kn be Lid. 



copies of V. Then 

Pr {{Vi : z = 1, . . . , m} are all distinct) 

' A — 1 — log m 



> 



log|V| 



(6) 



Proof: Fix i < m and a set fi, U2, . . . , Vi G V. Let Ai = 
{vi, . . . , Vi} and let Wi = l{Vi+i £ Ai), where 1(.) denotes 
the indicator function. We can write the distribution of V as 
a mixture: 

J6{0,1} 

We can bound from above the entropy of V as: 

HiV+i) < HiV,+i\W,) + H(W,) 

= Pr[W,=j]H{V,+i\W,^j) + HiW^,) 

ie{o,i} 

Since conditioning reduces entropy and the support of Vi+i 
conditioned on Wi ^ 1 is at most i, we have 



A < 1 + logi + Pr[Wi 0] log |V| 



Namely, 



^ ' ^- log|V| - log|V| 

But the event that each Vi is distinct is equivalent to the event 
that for each i € {2, . . . , to}, Wi is 0. ■ 

To prove the upper bound, we now present a series of 
claims. Let X denote the random variable corresponding to 
Alice's input codeword and let Y be the output of the channel. 
Thus Xi G {0, 1}*^ is Alice's input during the "babble" phase 
of length £ and X2 is her input during the "push" phase; the 
randomness comes from the message U and the stochastic 
encoding. Similarly, Yi is the random variable corresponding 
to the £ bits received by Bob during the "babble" phase, and 
Y2 the n — £ bits of the "push" phase. Let Adv denote the 
"babble-and-push" adversarial strategy. 

Let 

^ = {yi :ff(U|Yi =yi)>n£/4}, 

where the entropy iJ(U|Yi = yi) is measured over the 
randomness of the encoder, the message, and any randomness 
in Calvin's action during the "babble" phase. Further, let the 
event Eq be defined as 

Eo = {YieAn}. (7) 

Claim 4 For the "babble-and-push " attack Adv, 

Pr{Ea)>e/A. (8) 

Adv 

Proof: By the data processing inequality (U Xi 
Yi form a Markov chain and hence /(U; Yi) < /(Xi; Yi)), 
and the choice of Calvin's strategy, we have 

/(U;Yi) </(Xi;Yi) 

< £{l - H{pn/£)) 

= {an + en/2) (l-H 



a + e/2^ 
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Therefore 



i^(U|Yi) > iJ(U) - n{a + e/2) [l-H 



7i{a + e/2) [l-H 



ne/2 + n{ {a + e/2)H 



a + e/2 



a + e/2 



a + e/2 



--(f) j 

> ne/2. 

Here the first inequality follows from the definition of con- 
ditional entropy, the second from the assumption underlying 
this proof by contradiction that nR (and hence iJ(U)) violates 
dill-©, and the third from the fact that the function aH (p/a) 
is monotonically increasing in a since the function's derivative 
with respect to a equals log(a/(a — p)) which is always 
positive. Thus the expected value of H{XJ\Yi = yi) over yi 
is at least ne/2, and the maximum value of 77(U|Yi = yi) 
is nR. Applying the Markov inequaUty to the random variable 
nR - i7(U|Yi = yi), we see that 

Pr [nR - H(U|Yi = yi) >nR- ne/4] 

nR — ne/2 



< 



nR — ne/A 
e/4 



1 - 



and hence 



Pr[iJ(U|Yi = yi) > ne/A] > 



R-e/A' 

e/A 
R~e/A 



Using the fact that -R < 1 yields the result. ■ 
Now consider drawing m pairs {Ui,Ri) from By^ i.i.d. 
~ Pu.jRlyi (which happens to be uniform). Note that the 
marginal distribution of Ui is also i.i.d. ~ Pujyi^ which is 
not necessarily uniform. Let 



El = {{Ui, U2, • . • , U„i} are all distinct} . 



(9) 



Claim 5 Let puiy^ ^'^^ conditional distribution o/U given 
yi under Adv. Let Ui,U2, ■ ■ . ,Urn be m random variables 
drawn i.i.d. according to Pu|yj- Then for large enough n. 



Pr{Ei I Eo) > {e/^Y' 



(10) 



Proof: The proof follows from Claim |4] and Lemma |3] 
by using X ~ ne/A, V ~ XJ , and the fact that there are 
at most 2" messages, so |V| < 2". The lower bound in (|6]l 

then becomes ( '"^^ ~ °^ ™ ) ■ fixed m there exists 
a sufficiently large n such that e/A— (1 + logm)/n > e/5. ■ 
The preceding two claims establish a lower bound on the 
probability that Yi takes a value such that the distribution of 
the message U conditioned on Yi has sufficient entropy. For 
such values yi of Yi, we now use the fact that Alice's pair 
{u, r) is uniform in By^ to analyze the probability that Calvin's 



"push" attack succeeds. Let U' and X' denote the random 
choice of Calvin's message and codeword in the "push" phase. 
We show that the following two events occur with probability 
bounded away from zero: 

E2={IJ'^IJ} (11) 

£;3 = {dH(X2,X^) < 2(p-p)n-£n/8} (12) 

The first event is that Calvin chooses a different message than 
Alice and the second is that he chooses a codeword that is 
close enough to Alice's. The occurrence of the first event 
ensures that the codeword Calvin chooses to try to confuse 
Bob into thinking might have been transmitted corresponds 
to a message u' different than Alice's actual message u. The 
occurrence of the second event ensures that the two codewords 
chosen (x2 chosen by Alice, and X2 by Calvin) are "close 
enough" for Calvin to be able to push Bob's received codeword 
halfway between X2 and Xj. 

Claim 6 For the "babble-and-push" attack Adv, 

Pr (E2 and E3 \ Eo) > e'="-^/^l (13) 

Adv 

Proof: Conditioned on Eq, the realization yi satisfies 
H{U\Yi = yi) > en/A. We first use Claim [5] to lower 
bound the probability that E2 holds. First consider randomly 
sampling a set of mutually independent pairs S ~ {{ui,ri) : 
i G [m]} uniformly from By^, and let X' be the codeword for 

Claim |5] shows that with probability at least (e/5)"'~^, all 
the messages in S are distinct. In particular, this shows that 

Pr {E2 I Eo) > (e/5). 

Adv 

Turning to E3, applying Claim |5] for general m shows that 
the probability that m draws from the conditional distribution 
Pu|yj yield unique messages is lower bounded by 
Plotkin's bound 1241 (reprised in Theorem |2) shows that there 
do not exist binary error-correcting codes of block-length 



£ and minimum distance d with more than 



2d 



2d-{n-e) 

codewords. Setting m = 17/e, this bound implies that with 
probability at least (e/5)'"~^ there must exist codewords x, x' 
corresponding to {u, r) and {u', r') respectively (with u ^ u') 
within a distance d that satisfies 



17 



< 



2d 



2d-{n- I) 

Solving for d and using £ = (1 — A(j> — f>) + e/2)n shows that 
d satisfies 

17 en 17 , 

d <2(p~ p)n— --— < 2(p - p)n - en/8. 

~ ^ ' 17 + e 4 17 + e "-^ 1 

Let A = 2(p - p)n - en/8. 

Let 7 be the fraction of pairs (u, r) and (u' , r') in By^ that 
satisfy E2 and £'3. We would like to lower bound 7. A union 
bound shows that the probability over the selection of S gives 
the upper bound 



Pr {\J{dH{^\^') < A} and {U' ^ U^} ^ < 



m^7. 



(14) 
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However, the earlier argument shows that by selecting m = 
17/e pairs in 5*, we get a lower bound of (e/5)™^^ on the 
probability that (a) all {U'} are distinct, and (b) at least one 
pair X*, has distance less that A: 

Pr ^{all e 5 are distinct} and |J{(iij(X\ X^) < A} ^ 

> (£/5)™-i. (15) 

As the event analyzed in Equation ( fT4l i includes that ana- 
lyzed in Equation ( fTSl l. we have that 

1 _ 17^ /eN 

Therefore, by the definition of 7, we conclude our assertion. 

■ 

The next step is to show that Calvin does not "run out" of 
bit flips during the second "push" phase of his attack. This 
follows directly from Chernoff's bound [|25| . 

We now analyze Calvin's action during the "push" phase. 
This action can be viewed as being equivalent to the following 
two stages. In the first stage, rf/f (X2, Xj) bits are drawn i.i.d. 
Bemoulli-(l/2) - these bits comprise the intended error vector 
e. However, Calvin may not have the power to impose this 
intended vector in the push phase if the weight of e is too 
large. In general, the bit-flips in Calvin's actual error vector 
62 correspond to the components of e up to the point that he 
runs out of his bit-budget. 

Let d be the distance between the X2 chosen by Alice and 
X2 chosen by Calvin and let the event i?4 be defined as 

^ f fd en d e?i\ 1 

i^4=|^.t,(e)e (---,- + -]). (16) 

Claim 7 For the "babble-and-push " attack Adv, 

Pr (Ei I E2,E3) > 1 - 2^*^^(^'"). (17) 

Adv 

Proof: As d is the distance between the X2 chosen by 
Alice and Xj chosen by Calvin, without any constraint, Calvin 
would flip d/2 locations in expectation. Conditioned on E2 
and E3, we have the following upper bound: 

— < (p ~ pjn — en/16. 

Assume that d/2 = {p — p)n — en/16 (for smaller values of 
d the bound is only tighter). By Chernoff's bound ||251 . the 
probability that the number of bit flips in e (i.e., the Hamming 
weight of e) deviates from the expectation by more than en/ 16 
is at most 2"^^('^^"). ■ 
Note that the number of bit flips in the first phase of the 
algorithm is exactly pn, and thus Claim |7] implies that with 
high probability the total number of bit flips in e in the second 
phase will not exceed f + f|- < {p — p)n and will not be 
significantly less than that expected (i.e., less than f — ff - 
in this case Bob might be able to conclude that Xj was not 
transmitted). If this is not the case, our analysis assumes Calvin 
(in the worst case for him) fails to jam Alice's transmission 
to Bob. 



Theorem 8 For any code with stochastic encoding of rate 
R = C + e, under Calvin 's "babble-and-push " strategy the 
average error probability e is lower bounded by e'^'^/'^-'. 

Proof: The main idea behind the proof of our outer bound 
is that conditioned on events E^), E2, E^, and E^, (whose 
probabilities of occurrence are analyzed in Claims 2] IS] and|7]i, 
Calvin can "symmetrize" the channel llT4l . That is, Calvin can 
choose to inject bit-flips in a manner so that Bob is unable 
to distinguish between two possible codewords x and x' 
(corresponding to different messages u and u') transmitted by 
Alice. Calvin does this by ensuring (with probability bounded 
away from zero) that the codeword received by Bob, y, is 
likely to equal either x + e or x' + e' for two valid pairs (x, e) 
and (x',e') of transmitted codewords and bit-flip vectors. 

Let (li, r) denote the message and randomness of Alice, yi 
be the received codeword in the "babble" phase, and (w', r') be 
the message and randomness chosen by Calvin for the "push" 
phase. Let p{yi,u,r,u' ,r') be the joint distribution of these 
variables under Alice's uniform choice of {u, r) and Calvin's 
attack. For each y, let p{y\yi,u,r,u' ,r') be the conditional 
distribution of y under Calvin's attack. 

The error probability can be written as 

e= ^ p{yi,u,r,u',r') 

yi .u,r.u' .r' 

'^P{y\yi,u,r,u' ,r')Y'v{^{y) ^ u). 
y2 

Let F be the set of tuples (yi, u, r, w', r') satisfying events 
Ef), E2, and £'3. Claims |4] and |6] show that p(J') > (e/4) • 
£0(i/e), Por (yi,u,r,u' ,r') G J^, we have that u ^ u', and 
that X2(it,r) and X2(u',r') are sufficiently close. 

Assuming E4 holds, if y2 results from X2 via e, then y2 may 
also have resulted from X2 via e*-^ (the binary complement of 
e). Since e is generated via i.i.d. Bernoulli- (1/2) components, 
e and e'-^ have the same probability. 

Thus the conditional distribution is symmetric: 

P{y\yi,u,r,u',r') = p{y\yi,u',r',u,r). (18) 
Then for (yi, w, r, u\ r') e J", by Claim |7] 

^ p(y2|yi,u,r,^.',r')>l-2-^^(-'"). 

Now, returning to the overall error probabihty, let jo(yi) 
be the unconditional probability of Bob receiving yi in the 
"babble" phase, where the probability is taken over Alice's 
uniform choice of {u, r) and Calvin's random babble ei. Since 
the a-posteriori distribution of {u,r) and {u',r') given yi are 
independent and both uniform in By-^, the joint distribution 
can be written as 

p{yi,u,r,u',r') = p(yi) • ^ 

l^yil 
= p{yi,u',r',u,r). 

Recall that for any y2 G G, 

P{y2\yi, u,r,u',r') = p{y2\yi, u',r',u,r). 
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Thus, 

2£ > ^p{yi,u,r,u\r') 



IS 



7" 



^2|yi,u,r,w',r')Pr(^'(yi,y2) ^ u) 



\y2eg 

+ J2 p(y2|yi,u',r',u,r)Pr(«'(yi,y2) T^u') 
y2ee / 

>Yp{yi,u,r,u\r') ^ p{y2\yi, u,r,u',r') 

(Pr(vl/(yi,y2) ^ + Pr(*(yi , ya) ^ li')) 
^ H'^^yi X! P(y2|yi,u,r-,w',r') (19) 



y2G5 



> e/4 • • (1 - 2-"(^'")) 



Our analysis implies a refined statement of Theorem [T] 
Namely, let c be a sufficiently large constant. For any block- 
length n, any e > and any encoding/decoding scheme of 
Alice and Bob of rate (C(p) + + e, Calvin can cause a 
decoding error probability of at least e'^^^/^^. 



IV. Improved bounds for deterministic codes 

We now present an alternative analysis for the case of 
deterministic encoding. Without loss of generality, we assume 
each codeword corresponds to a unique message in U so there 
are 2^" distinct equiprobable codewords in {x(u) : u e lA], 
with a unique codeword for each message. The attack is 
the same as in Section HUl Apart from the simpler proof, 
the analysis below gives a decoding error e proportional to 
e, which improves over the decoding error presented for 
stochastic encoding appearing in the body of this work. 

Using the notation of Section [nil for any vector yi consider 
the set 



{(x, ei) : xi + ei = yi, 61== pn}. 



(20) 



Here, ei represents the potential error vector that Calvin im- 
poses in the first stage of its attack on the transmitted codeword 
X. Notice that the set By^ defined above is analogous to the 
set defined in Section Namely, for any message u 
a pair (u, r) e By^ in Section |lll] corresponds to a pair 
(x(m), ei) € -Byi defined above. We note that in the definition 
above By^ l^-By; — for yi 7^ y'l as we assume all codewords 
to be distinct. 

Claim 9 Wi'f/i probability at least 1/2 over the codeword x 
sent by Alice and the actions of Calvin in the first stage of his 
attack, the set By^ is of size at least 2'^"/^/2. 



Proof: The proof is obtained by the following counting 
argument. The number of possible sets By^ is exactly 2^ = 
2aTi+en/2 rpj^g number of pairs (x, ei) for a codeword x and 
an error vector ei (to be applied in the first stage by Calvin) 



\pn) ~ \pn 



\, 2"^^ . i^OLnliip j OL) — en j ^ 

Here the first inequality follows from the fact that I < an, 
the second inequality from the standard bound 2"^^^/") /in- + 
1) < (10 (for instance (261, Theorem 11.1.3]) and the fact that 
n is sufficiently large with respect to 1/e and hence 2^^"/'' 
is smaller than any polynomial in l/(?7 + 1), and the third 
inequality from the starting assumption that R is at least e + 
a{l — H{p/a)). Thus, the average size of a set By-^ is at 
least 2*^"/^. Consider all the sets By^ of size less than half the 
average 2^"/^/2. The total number of codewords in the union 
of these sets is at most 



2^ . 2'^"/4/2 < 2 



Rn 



pn 



which is half the number of (x, ei) pairs. As each pair 
is chosen with the same probability, we conclude that with 
probability at least 1/2 the pair (x, ei) appears in a set By^ 
which is of size at least 2^"/"'/2. This completes the proof of 
our assertion. ■ 
We now show that Claim |9] above implies that the transmit- 
ted codeword x and the codeword x' chosen by Calvin are 
distinct and of small Hamming distance apart with a positive 
probability (independent of n). 

Claim 10 Conditioned on Claim |9] with probability at least 
X 7^ x' and d/f (x2, Xj) < 2{p — p)n — en/8. 

Proof: Consider the undirected graph Q = in 
which the vertex set V consists of the set By^ and two nodes 
X and x' are connected by an edge if di/(x2,X2) < d ~ 
2{p~p)n — en/8. The set of codewords defined by the suffixes 
of an independent set I in Q corresponds to a binary error- 
correcting code with block-length n — I ~ A{p — p)n — en/2 
of size \I\ and minimum distance d. 

By Plotkin's bound Il24ll (reprised in Theorem |2} there 
do not exist binary error correcting codes with more than 
„ , ,., — 2ii _p- -|- 1 codewords. Thus I, any maximal 

2tl— {4(p— p)n— en/2) ' 

independent set in Q, must satisfy 



< 



2{2{p-p)n ~ en/8) 



2{2{p - p)n - en/8) - A{p - p)n + en/2 



lQ{p — p) ^ 16 



(21) 



By Turan's theorem lIZTl . any undirected graph Q on \V\ 
vertices and average degree A has an independent set of size 
at least |V|/(A + 1). This, along with (EB implies that the 
average degree of our graph Q satisfies 

A + 1 - ' ' - e 



_ This in turn implies that 



A > 



eM 
16 



1 > 



eM 
32 
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The second inequality holds for our setting of n, since |V| is 
of size at least 2'"/^. To summarize the above discussion, we 
have shown that our graph G has large average degree of size 

A > 35^. We now use this fact to analyze Calvin's attack. 

By the definition of deterministic codes, any valid codeword 
in A"" is transmitted with equal probability. Also, by definition 
both X (the transmitted codeword) and x' (the codeword 
chosen by Calvin) are in V = By^. Hence both x and x' are 
uniform in By-^. This implies that with probability |£|/|Vp the 
nodes corresponding to codewords x and x' are distinct and 
connected by an edge in Q. This in turn implies that with prob- 
ability |f|/|V|2, X ^ x' and (ii/(x,x') < 2{p ~ p)n - en/8, 
as required. Now 

A|V| 

ivp 



2|VP - 64' 



The preceding claims provide the same guarantees as Claim 
|6] appearing in the body of the paper, and so Claim |7] follows. 
Namely, w.h.p., Calvin does not "run out" of his budget of 
pn bit flips. We conclude by proving that given the analysis 
above Bob cannot distinguish between the case in which x or 
x' were transmitted, using a similar symmetrization argument. 

Theorem 11 For any code with deterministic encoding and 
decoding of rate R ~ C + e, under Calvin's "babble- 
and-push" strategy the average error probability e is lower 
bounded by ^(1- 2-"(^'")^ 

Proof: Let u be the message chosen by Alice, u' be the 
message chosen by Calvin, p{yi,u,u') be the joint distribu- 
tion of the output during the "babble" phase and these two 
messages, and p(y|yi, u, u') be the conditional distribution of 
the output on the result of the "babble" phase. 

Let Q' be the set of y2 such that Claim |7] is satisfied for 
X2(m). As in the arguments of Theorem |8] Calvin's attack is 
symmetric, so that 

p{yi,'U;u') = p{yi,u',u), 

and therefore we have 

p(y|yi,u,w')>i-2-"^^'"^. 

Let T be the set of tuples (yi, u, u') satisfying Claim|9]and 
Claim [To] Following the analysis in Theorem[8] from i 
applying Claims |9] and Claim [TO] we have 



and 



i,u,u ^ 



> 



e 

128 



1 - 2 



Dividing both sides by 2 yields the result. ■ 

V. Concluding remarks 

In this paper we presented a novel upper bound on the 
rates achievable on binary additive channels with a causal 
adversary. This model is weaker than the traditional worst- 
case error model studied in coding theory, but is stronger than 
an i.i.d. model for the noise. Indeed, our results show the 



binary symmetric channel capacity 1 — H{p) is not achievable 
against causal adversaries. By contrast, previous work shows 
that a delay of Dn (with D a positive constant in (0, 1]) for 
the adversary allows Alice and Bob to communicate at rate 
\~ H{p). Thus the causal adversary is strictly more powerful 
than the delayed adversary (which in turn is no stronger than 
i.i.d. noise). 

To show our bound we demonstrated a new "babble-and- 
push" attack. The adversary increases the uncertainty at the 
decoder during the "babble" phase, enabling it to choose an al- 
ternative codeword during the "push" phase. The "push" phase 
succeeds because the adversary can effectively symmetrize 
the channel. We demonstrate that the upper bound presented 
herein holds against arbitrary codes, rather than simply against 
deterministic codes, as is common in the coding theory litera- 
ture. Since our analysis pertains to adversarial jamming rather 
than random noise, the proof techniques presented may be of 
independent interest in the more general setting of AVCs. 



Appendix A 
The minimization in Theorem[T] 

Let us denote p by x, and write the bound as a function of 

X as 



f{x) = (1 -4p + 4x-) 1 - 



1 - Ap + Ax 



So 



4- ^< {l~Ap + ix) 
ax 



■log 



1 - Ap + Ax 1- Ap + Ax 

1 - 4p + 3a; , l-Ap + 3x 

log 

1~ Ap + Ax 1- Ap + Ax 



X log 



1- Ap + Ax 



(1 -4p + 3a;)log 



l-Ap + 3x 



log 



= 4 
= 4- 



log 
log 



1~ Ap + Ax \ 

X {I - Ap + Ax) - Ax 

1- Ap + Ax I - Ap + Ax 

l-Ap + 3x 
-^^""^l-Ap + Ax 

3(1 -Ap + Ax) - 4(1 -Ap + 3x) 
1 - Ap + Ax 

l-Ap + 3x 



1 - Ap + Ax 
x{l - Ap + 3x)^ 
(1 - 4p + 4x)4 



3 log 



1- Ap + Ax 
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First, we check for any roots of f'{x) in < a; < p. 

fix) = 

x(l - 4p + 3xf 



1 

16 



(1 -4p + 4a;)4 
^ ((1 - 439 + 3x) + x)^ = 16x{l -4p + 3xf 
^ (1 - 4p + 3.t)'^ - 12x{l -Ap + 3xf 

+ 6x^(1 - 4p + 3xf + 4x^(1 - 4p + 32;) +x^ = Q. 

We now substitute, for brevity, a = (1 — 4p + 3a;)/a:. 

a'' - I2a^ + + 4a + 1 = 
^ (a - l)(a^^ - lla^ - 5a - 1) = 

We now consider two cases. If p = 0.25, we have that f{x) ~ 

for a; = 0. Thus setting a; = will yield the minimum value 
for p = 1 /A. 

For p < 0.25, we study the minimum value given a; > 0. 
When X > and p < 0.25 it holds that 1 - 4p > and a > 3. 
Thus, for f'{x) to be zero we require that a"^ — lla^ — 5a — 

1 = 0, which can be found via the general formula for cubic 
equations (for instance ||28l Chap. 6]) to have one real solution 
and two complex conjugate solutions. The only real solution 
is 



1 



ao = 



11 



3 

11.4445, 



1592 + 24V33 



1592 - 24V33 



giving a; = (1 - 4p)/(ao - 3) ~ (1 - 4p)/8.4445. However, 
this value is greater than p for p < l/(ao + 1) ~ 1/12.4445. 
For p e [l/(ao + 1), 0.25], this solution is in the range [0,p]. 
Now we will see that f'{x) is negative for < x < (1 — 

4p)/(ao - 3). 

For p < 0.25, and < a; < (1 - 4p)/(ao - 3), we have 



1 - 4p + 3a: 



> ao ~ 11.4445, 



so 

a3 



lla^ - 5a - 1 ~ (a - ao){a^ + 0.4445a + 0.087) > 0, 



but since (a — l)(a^ — lla^ — 5a — 1) > we have 
f'{x) < 0. By the continuity of the objective function, 
/(O) = liuix^Q f{x). So, f{x) is decreasing in < x- < 
(1 — 4p)/ao — 3, and thus the optimum p is given by 



p = nun < p, 



l-4p 
ao - 3 



mm < p, 



3(1 - 4p) 



mm < p, 



2 + v/l592 + 24V33 + ^^1592 - 24^/33 ) 
1 -4p1 



.4445 
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